Case study: how do features of nesting female horseshoe crabs influence the number of males found nearby?
Load the data. Here are the top six rows of 173 rows:
| 2 |
3 |
28.3 |
3.05 |
8 |
| 3 |
3 |
26.0 |
2.60 |
4 |
| 3 |
3 |
25.6 |
2.15 |
0 |
| 4 |
2 |
21.0 |
1.85 |
0 |
| 2 |
3 |
29.0 |
3.00 |
1 |
| 1 |
2 |
25.0 |
2.30 |
3 |
Predictors: Colour; spine condition; carapace width; weight.
First, let’s see how carapace width influences the mean number of males nearby.
Data source: H. Jane Brockmann’s 1996 paper; found online here; another regression demo with this data is found here.
Approach 1: Estimate regression curve / model function locally
Preliminary questions
These questions are meant to check your understanding of local regression.
What is the estimated mean number of nearby males for nesting females having a carapace width of 32.5? Use the following methods, by hand.
1. kNN with \(k=3\).
2. Using a moving window with a radius of 2.4.
3. Using a kernel smoother with Gaussian kernel with variance 1.
4. Using local polynomials with a radius of 2.4 and a flat kernel, first with degree 1, then with degree 2.
Fit a smoother by eye
Optimize the loess fit by-eye. Just modify span, to keep things simple.
grid <- seq(min(crab$width), max(crab$width), length.out=100)
grid_df <- tibble(width = grid)
# FIT_MODEL_HERE
# PLOT_CURVE_HERE
What’s the error of this model? Training error is fine.
How well does this model answer our original question?
Approach 2: Linear Regression
Fit a linear regression model
Fit a linear regression model. What’s the error?
How well does this model answer our original question? Do you see a potential problem with this model? Are any assumptions of linear regression not true? Brainstorm ideas for how to deal with the problems.
Approach 3: Link Function
Fit a GLM. What’s the error?
LS0tCnRpdGxlOiAiQ2FyZSBhbmQgVmFsdWUgb2YgTW9kZWwgQXNzdW1wdGlvbnM6IENhc2Ugc3R1ZHkiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCmBgYHtyfQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSh0aWR5dmVyc2UpKQpgYGAKCkNhc2Ugc3R1ZHk6IGhvdyBkbyBmZWF0dXJlcyBvZiBuZXN0aW5nIGZlbWFsZSBob3JzZXNob2UgY3JhYnMgaW5mbHVlbmNlIHRoZSBudW1iZXIgb2YgbWFsZXMgZm91bmQgbmVhcmJ5PwoKTG9hZCB0aGUgZGF0YS4gSGVyZSBhcmUgdGhlIHRvcCBzaXggcm93cyBvZiAxNzMgcm93czoKCmBgYHtyfQpjcmFiIDwtIHJlYWRfdGFibGUoImh0dHBzOi8vbmV3b25saW5lY291cnNlcy5zY2llbmNlLnBzdS5lZHUvc3RhdDUwNC9zaXRlcy9vbmxpbmVjb3Vyc2VzLnNjaWVuY2UucHN1LmVkdS5zdGF0NTA0L2ZpbGVzL2xlc3NvbjA3L2NyYWIvaW5kZXgudHh0IiwgY29sX25hbWVzID0gRkFMU0UpICU+JSAKICBzZWxlY3QoLTEpICU+JSAKICBzZXROYW1lcyhjKCJjb2xvdXIiLCJzcGluZSIsIndpZHRoIiwid2VpZ2h0Iiwibl9tYWxlIikpICU+JSAKICBtdXRhdGUoY29sb3VyID0gZmFjdG9yKGNvbG91ciksCiAgICAgICAgIHNwaW5lICA9IGZhY3RvcihzcGluZSkpCmtuaXRyOjprYWJsZShoZWFkKGNyYWIpKQpgYGAKClByZWRpY3RvcnM6IENvbG91cjsgc3BpbmUgY29uZGl0aW9uOyBjYXJhcGFjZSB3aWR0aDsgd2VpZ2h0LiAKCkZpcnN0LCBsZXQncyBzZWUgaG93IGNhcmFwYWNlIHdpZHRoIGluZmx1ZW5jZXMgdGhlIG1lYW4gbnVtYmVyIG9mIG1hbGVzIG5lYXJieS4KCmBgYHtyLCBmaWcud2lkdGg9NiwgZmlnLmhlaWdodD0zfQpwIDwtIGdncGxvdChjcmFiLCBhZXMod2lkdGgsIG5fbWFsZSkpICsgCiAgZ2VvbV9wb2ludChhbHBoYT0wLjI1KSArCiAgbGFicyh4ID0gIkNhcmFwYWNlIFdpZHRoIiwgCiAgICAgICB5ID0gIk5vLiBtYWxlc1xubmVhcmJ5IikgKwogIHRoZW1lX2J3KCkgKwogIHRoZW1lKGF4aXMudGl0bGUueSA9IGVsZW1lbnRfdGV4dChhbmdsZT0wLCB2anVzdD0wLjUpKQpwbG90bHk6OmdncGxvdGx5KHApCmBgYAoKRGF0YSBzb3VyY2U6IFtILiBKYW5lIEJyb2NrbWFubidzIDE5OTYgcGFwZXJdKGh0dHBzOi8vb25saW5lbGlicmFyeS53aWxleS5jb20vZG9pL2Ficy8xMC4xMTExL2ouMTQzOS0wMzEwLjE5OTYudGIwMTA5OS54KTsgZm91bmQgb25saW5lIFtoZXJlXShodHRwczovL25ld29ubGluZWNvdXJzZXMuc2NpZW5jZS5wc3UuZWR1L3N0YXQ1MDQvc2l0ZXMvb25saW5lY291cnNlcy5zY2llbmNlLnBzdS5lZHUuc3RhdDUwNC9maWxlcy9sZXNzb24wNy9jcmFiL2luZGV4LnR4dCk7IGFub3RoZXIgcmVncmVzc2lvbiBkZW1vIHdpdGggdGhpcyBkYXRhIGlzIGZvdW5kIFtoZXJlXShodHRwczovL25ld29ubGluZWNvdXJzZXMuc2NpZW5jZS5wc3UuZWR1L3N0YXQ1MDQvbm9kZS8xNjkvKS4KCgojIyBBcHByb2FjaCAxOiBFc3RpbWF0ZSByZWdyZXNzaW9uIGN1cnZlIC8gbW9kZWwgZnVuY3Rpb24gbG9jYWxseQoKIyMjIFByZWxpbWluYXJ5IHF1ZXN0aW9ucwoKVGhlc2UgcXVlc3Rpb25zIGFyZSBtZWFudCB0byBjaGVjayB5b3VyIHVuZGVyc3RhbmRpbmcgb2YgbG9jYWwgcmVncmVzc2lvbi4KCldoYXQgaXMgdGhlIGVzdGltYXRlZCBtZWFuIG51bWJlciBvZiBuZWFyYnkgbWFsZXMgZm9yIG5lc3RpbmcgZmVtYWxlcyBoYXZpbmcgYSBjYXJhcGFjZSB3aWR0aCBvZiAzMi41PyBVc2UgdGhlIGZvbGxvd2luZyBtZXRob2RzLCBieSBoYW5kLgoKMVwuIGtOTiB3aXRoICRrPTMkLgoKYGBge3J9CgpgYGAKCjJcLiBVc2luZyBhIG1vdmluZyB3aW5kb3cgd2l0aCBhIHJhZGl1cyBvZiAyLjQuCgpgYGB7cn0KCmBgYAoKM1wuIFVzaW5nIGEga2VybmVsIHNtb290aGVyIHdpdGggR2F1c3NpYW4ga2VybmVsIHdpdGggdmFyaWFuY2UgMS4KCmBgYHtyfQoKYGBgCgo0XC4gVXNpbmcgbG9jYWwgcG9seW5vbWlhbHMgd2l0aCBhIHJhZGl1cyBvZiAyLjQgYW5kIGEgZmxhdCBrZXJuZWwsIGZpcnN0IHdpdGggZGVncmVlIDEsIHRoZW4gd2l0aCBkZWdyZWUgMi4KCmBgYHtyfQoKYGBgCgojIyMgRml0IGEgc21vb3RoZXIgYnkgZXllCgpPcHRpbWl6ZSB0aGUgbG9lc3MgZml0IGJ5LWV5ZS4gSnVzdCBtb2RpZnkgc3BhbiwgdG8ga2VlcCB0aGluZ3Mgc2ltcGxlLgoKYGBge3IsIGZpZy53aWR0aD02LCBmaWcuaGVpZ2h0PTR9CmdyaWQgPC0gc2VxKG1pbihjcmFiJHdpZHRoKSwgbWF4KGNyYWIkd2lkdGgpLCBsZW5ndGgub3V0PTEwMCkKZ3JpZF9kZiA8LSB0aWJibGUod2lkdGggPSBncmlkKQojIEZJVF9NT0RFTF9IRVJFCiMgUExPVF9DVVJWRV9IRVJFCmBgYAoKV2hhdCdzIHRoZSBlcnJvciBvZiB0aGlzIG1vZGVsPyBUcmFpbmluZyBlcnJvciBpcyBmaW5lLgoKYGBge3J9CgpgYGAKCkhvdyB3ZWxsIGRvZXMgdGhpcyBtb2RlbCBhbnN3ZXIgb3VyIG9yaWdpbmFsIHF1ZXN0aW9uPwoKIyMgQXBwcm9hY2ggMjogTGluZWFyIFJlZ3Jlc3Npb24KCiMjIyBGaXQgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbAoKRml0IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwuIFdoYXQncyB0aGUgZXJyb3I/CgpgYGB7cn0KCmBgYAoKSG93IHdlbGwgZG9lcyB0aGlzIG1vZGVsIGFuc3dlciBvdXIgb3JpZ2luYWwgcXVlc3Rpb24/IERvIHlvdSBzZWUgYSBwb3RlbnRpYWwgcHJvYmxlbSB3aXRoIHRoaXMgbW9kZWw/IEFyZSBhbnkgYXNzdW1wdGlvbnMgb2YgbGluZWFyIHJlZ3Jlc3Npb24gbm90IHRydWU/IEJyYWluc3Rvcm0gaWRlYXMgZm9yIGhvdyB0byBkZWFsIHdpdGggdGhlIHByb2JsZW1zLgoKIyMgQXBwcm9hY2ggMzogTGluayBGdW5jdGlvbgoKRml0IGEgR0xNLiBXaGF0J3MgdGhlIGVycm9yPwoKYGBge3J9CgpgYGAKCg==